Introduction and Data Summary
This report is an analysis of three datasets which have been selected for practicing and demonstrating some methods of Data Visualization. The first dataset used in this analysis is a compilation of housing prices from West Roxbury which includes supporting features such as number of rooms, number of floors, total square footage, and several other similar attributes. Using multiple linear regression, a model predicting the total value of a house is fit to the dataset and the resulting coefficients and their estimates are presented. The second set of data consists of two datasets pertaining to lakes in Florida. One dataset contains the mapping schemas of all Florida lakes while the other dataset contains measures of water quality for a subset of Florida lakes. Using these datasets, the water quality parameters of several Polk County lakes are visualized.
Methods
West Roxbury Housing Prices
The West Roxbury housing price dataset contains 14 total attributes covering many core features of homes such as number of rooms, bedrooms, kitchens, floors, and some others. Before regression is performed, some data pre-processing is performed such as converting many of the categorical variables to factors and adjusting some of the names of variables to better fit R’s syntax. An initial model is fit using all of the available variables. This initial model suggests that the number of rooms and bedrooms may not be as significant as the other included features in the model, so those two variables were removed and a second model was fit. In this second model, almost all variables appear significant except for some of the highest factor levels. For example having one, two, or three fireplaces has a significant influence on total price, but having a fourth fireplace seems to be insignificant. The same pattern appears in the case of having a third floor or a third half bathroom.
Florida Lakes
The Florida Lakes dataset contains several measures of water quality pertaining to lakes such as pH, alkalinity, calcium, chlorophyll, and others. These features can all be examined to evaluate the general health of a lake. The other Florida Lakes dataset contains the shapefiles of the lakes including features such as total area, perimeter, and the residing county. A challenge that came up while working with these datasets was matching the lakes from the dataset to the lakes from the shapefiles. The only variable that could be as a key for joining was the name of the lake, where both datasets used a slightly different naming convention. Additionally, only the shapefile lakes included which county the lakes were from, and there are many lakes that share the same name but reside in different counties. In this case, the author elected to focus only on Polk County lakes, and simply try to match the names.
Visualizations
West Roxbury
Each of the coefficients of the final model fit are presented here along with their associated metrics. Coefficients with a p-value less than 0.05 are considered to be significant.
This visualization presents each of the coefficients along with their estimates. Estimates that appear on the left of the ‘zero’ line represent having a negative effect on house prices, while estimates appearing on the right represent having an increasing effect on house prices.
Polk County Lakes
Polk County Lakes are presented above visualizing average Mercury, pH, chlorophyll, and calcium. Lakes with a pH between ~8 and 9 appear to have less mercury and more chlorophyll present than lakes with lower pH.
The polk county lakes that had water quality data available are presented above. Lake Parker, being the largest lake here, appears to have a lot more algae present than in the other presented lakes.
Conclusions
Both of the datasets used here make for great demonstrations of visualizing multiple linear regression, interactive plots, and spatial visualizations. Other ideas that were considered include performing multiple linear regression on the lakes dataset to fit a model predicting chlorophyll, but this was left out in favor of performing regression on the housing dataset. Future iterations of this work should consider revisiting the join performed on the lake datasets, as it is likely that an error occurred resulting in the possibility of attributes being matched to a lake that shared the same name, but may not actually be from that county. If another dataset was found that included water quality metrics and the associated county where the lake is from, a more accurate analysis could be performed.
LS0tDQp0aXRsZTogIlIgTm90ZWJvb2siDQpvdXRwdXQ6DQogIGh0bWxfbm90ZWJvb2s6IGRlZmF1bHQNCiAgaHRtbF9kb2N1bWVudDoNCiAgICBkZl9wcmludDogcGFnZWQNCiAgcGRmX2RvY3VtZW50OiBkZWZhdWx0DQotLS0NCg0KIyBJbnRyb2R1Y3Rpb24gYW5kIERhdGEgU3VtbWFyeQ0KDQogIFRoaXMgcmVwb3J0IGlzIGFuIGFuYWx5c2lzIG9mIHRocmVlIGRhdGFzZXRzIHdoaWNoIGhhdmUgYmVlbiBzZWxlY3RlZCBmb3IgcHJhY3RpY2luZyBhbmQgZGVtb25zdHJhdGluZyBzb21lIG1ldGhvZHMgb2YgRGF0YSBWaXN1YWxpemF0aW9uLiBUaGUgZmlyc3QgZGF0YXNldCB1c2VkIGluIHRoaXMgYW5hbHlzaXMgaXMgYSBjb21waWxhdGlvbiBvZiBob3VzaW5nIHByaWNlcyBmcm9tIFdlc3QgUm94YnVyeSB3aGljaCBpbmNsdWRlcyBzdXBwb3J0aW5nIGZlYXR1cmVzIHN1Y2ggYXMgbnVtYmVyIG9mIHJvb21zLCBudW1iZXIgb2YgZmxvb3JzLCB0b3RhbCBzcXVhcmUgZm9vdGFnZSwgYW5kIHNldmVyYWwgb3RoZXIgc2ltaWxhciBhdHRyaWJ1dGVzLiBVc2luZyBtdWx0aXBsZSBsaW5lYXIgcmVncmVzc2lvbiwgYSBtb2RlbCBwcmVkaWN0aW5nIHRoZSB0b3RhbCB2YWx1ZSBvZiBhIGhvdXNlIGlzIGZpdCB0byB0aGUgZGF0YXNldCBhbmQgdGhlIHJlc3VsdGluZyBjb2VmZmljaWVudHMgYW5kIHRoZWlyIGVzdGltYXRlcyBhcmUgcHJlc2VudGVkLiANCiAgVGhlIHNlY29uZCBzZXQgb2YgZGF0YSBjb25zaXN0cyBvZiB0d28gZGF0YXNldHMgcGVydGFpbmluZyB0byBsYWtlcyBpbiBGbG9yaWRhLiBPbmUgZGF0YXNldCBjb250YWlucyB0aGUgbWFwcGluZyBzY2hlbWFzIG9mIGFsbCBGbG9yaWRhIGxha2VzIHdoaWxlIHRoZSBvdGhlciBkYXRhc2V0IGNvbnRhaW5zIG1lYXN1cmVzIG9mIHdhdGVyIHF1YWxpdHkgZm9yIGEgc3Vic2V0IG9mIEZsb3JpZGEgbGFrZXMuIFVzaW5nIHRoZXNlIGRhdGFzZXRzLCB0aGUgd2F0ZXIgcXVhbGl0eSBwYXJhbWV0ZXJzIG9mIHNldmVyYWwgUG9sayBDb3VudHkgbGFrZXMgYXJlIHZpc3VhbGl6ZWQuIA0KICANCiMgTWV0aG9kcw0KDQojIyBXZXN0IFJveGJ1cnkgSG91c2luZyBQcmljZXMNCg0KICBUaGUgV2VzdCBSb3hidXJ5IGhvdXNpbmcgcHJpY2UgZGF0YXNldCBjb250YWlucyAxNCB0b3RhbCBhdHRyaWJ1dGVzIGNvdmVyaW5nIG1hbnkgY29yZSBmZWF0dXJlcyBvZiBob21lcyBzdWNoIGFzIG51bWJlciBvZiByb29tcywgYmVkcm9vbXMsIGtpdGNoZW5zLCBmbG9vcnMsIGFuZCBzb21lIG90aGVycy4gQmVmb3JlIHJlZ3Jlc3Npb24gaXMgcGVyZm9ybWVkLCBzb21lIGRhdGEgcHJlLXByb2Nlc3NpbmcgaXMgcGVyZm9ybWVkIHN1Y2ggYXMgY29udmVydGluZyBtYW55IG9mIHRoZSBjYXRlZ29yaWNhbCB2YXJpYWJsZXMgdG8gZmFjdG9ycyBhbmQgYWRqdXN0aW5nIHNvbWUgb2YgdGhlIG5hbWVzIG9mIHZhcmlhYmxlcyB0byBiZXR0ZXIgZml0IFIncyBzeW50YXguIEFuIGluaXRpYWwgbW9kZWwgaXMgZml0IHVzaW5nIGFsbCBvZiB0aGUgYXZhaWxhYmxlIHZhcmlhYmxlcy4gVGhpcyBpbml0aWFsIG1vZGVsIHN1Z2dlc3RzIHRoYXQgdGhlIG51bWJlciBvZiByb29tcyBhbmQgYmVkcm9vbXMgbWF5IG5vdCBiZSBhcyBzaWduaWZpY2FudCBhcyB0aGUgb3RoZXIgaW5jbHVkZWQgZmVhdHVyZXMgaW4gdGhlIG1vZGVsLCBzbyB0aG9zZSB0d28gdmFyaWFibGVzIHdlcmUgcmVtb3ZlZCBhbmQgYSBzZWNvbmQgbW9kZWwgd2FzIGZpdC4gSW4gdGhpcyBzZWNvbmQgbW9kZWwsIGFsbW9zdCBhbGwgdmFyaWFibGVzIGFwcGVhciBzaWduaWZpY2FudCBleGNlcHQgZm9yIHNvbWUgb2YgdGhlIGhpZ2hlc3QgZmFjdG9yIGxldmVscy4gRm9yIGV4YW1wbGUgaGF2aW5nIG9uZSwgdHdvLCBvciB0aHJlZSBmaXJlcGxhY2VzIGhhcyBhIHNpZ25pZmljYW50IGluZmx1ZW5jZSBvbiB0b3RhbCBwcmljZSwgYnV0IGhhdmluZyBhIGZvdXJ0aCBmaXJlcGxhY2Ugc2VlbXMgdG8gYmUgaW5zaWduaWZpY2FudC4gVGhlIHNhbWUgcGF0dGVybiBhcHBlYXJzIGluIHRoZSBjYXNlIG9mIGhhdmluZyBhIHRoaXJkIGZsb29yIG9yIGEgdGhpcmQgaGFsZiBiYXRocm9vbS4gDQogIA0KIyMgRmxvcmlkYSBMYWtlcyANCg0KICBUaGUgRmxvcmlkYSBMYWtlcyBkYXRhc2V0IGNvbnRhaW5zIHNldmVyYWwgbWVhc3VyZXMgb2Ygd2F0ZXIgcXVhbGl0eSBwZXJ0YWluaW5nIHRvIGxha2VzIHN1Y2ggYXMgcEgsIGFsa2FsaW5pdHksIGNhbGNpdW0sIGNobG9yb3BoeWxsLCBhbmQgb3RoZXJzLiBUaGVzZSBmZWF0dXJlcyBjYW4gYWxsIGJlIGV4YW1pbmVkIHRvIGV2YWx1YXRlIHRoZSBnZW5lcmFsIGhlYWx0aCBvZiBhIGxha2UuIFRoZSBvdGhlciBGbG9yaWRhIExha2VzIGRhdGFzZXQgY29udGFpbnMgdGhlIHNoYXBlZmlsZXMgb2YgdGhlIGxha2VzIGluY2x1ZGluZyBmZWF0dXJlcyBzdWNoIGFzIHRvdGFsIGFyZWEsIHBlcmltZXRlciwgYW5kIHRoZSByZXNpZGluZyBjb3VudHkuIA0KICBBIGNoYWxsZW5nZSB0aGF0IGNhbWUgdXAgd2hpbGUgd29ya2luZyB3aXRoIHRoZXNlIGRhdGFzZXRzIHdhcyBtYXRjaGluZyB0aGUgbGFrZXMgZnJvbSB0aGUgZGF0YXNldCB0byB0aGUgbGFrZXMgZnJvbSB0aGUgc2hhcGVmaWxlcy4gVGhlIG9ubHkgdmFyaWFibGUgdGhhdCBjb3VsZCBiZSBhcyBhIGtleSBmb3Igam9pbmluZyB3YXMgdGhlIG5hbWUgb2YgdGhlIGxha2UsIHdoZXJlIGJvdGggZGF0YXNldHMgdXNlZCBhIHNsaWdodGx5IGRpZmZlcmVudCBuYW1pbmcgY29udmVudGlvbi4gQWRkaXRpb25hbGx5LCBvbmx5IHRoZSBzaGFwZWZpbGUgbGFrZXMgaW5jbHVkZWQgd2hpY2ggY291bnR5IHRoZSBsYWtlcyB3ZXJlIGZyb20sIGFuZCB0aGVyZSBhcmUgbWFueSBsYWtlcyB0aGF0IHNoYXJlIHRoZSBzYW1lIG5hbWUgYnV0IHJlc2lkZSBpbiBkaWZmZXJlbnQgY291bnRpZXMuIEluIHRoaXMgY2FzZSwgdGhlIGF1dGhvciBlbGVjdGVkIHRvIGZvY3VzIG9ubHkgb24gUG9sayBDb3VudHkgbGFrZXMsIGFuZCBzaW1wbHkgdHJ5IHRvIG1hdGNoIHRoZSBuYW1lcy4gDQoNCiMgVmlzdWFsaXphdGlvbnMNCg0KIyMgV2VzdCBSb3hidXJ5DQoNCiFbSG91c2luZyBQcmljZSBDb2VmZmljaWVudHNdKC4uL2ltYWdlcy9ob3VzZV90YWJsZS5wbmcpDQoNCkVhY2ggb2YgdGhlIGNvZWZmaWNpZW50cyBvZiB0aGUgZmluYWwgbW9kZWwgZml0IGFyZSBwcmVzZW50ZWQgaGVyZSBhbG9uZyB3aXRoIHRoZWlyIGFzc29jaWF0ZWQgbWV0cmljcy4gQ29lZmZpY2llbnRzIHdpdGggYSBwLXZhbHVlIGxlc3MgdGhhbiAwLjA1IGFyZSBjb25zaWRlcmVkIHRvIGJlIHNpZ25pZmljYW50LiANCg0KIVtIb3VzaW5nIFByaWNlIENvZWZmaWNpZW50c10oLi4vaW1hZ2VzL2hvdXNlX2NvZWZzLnN2ZykNCg0KVGhpcyB2aXN1YWxpemF0aW9uIHByZXNlbnRzIGVhY2ggb2YgdGhlIGNvZWZmaWNpZW50cyBhbG9uZyB3aXRoIHRoZWlyIGVzdGltYXRlcy4gRXN0aW1hdGVzIHRoYXQgYXBwZWFyIG9uIHRoZSBsZWZ0IG9mIHRoZSAnemVybycgbGluZSByZXByZXNlbnQgaGF2aW5nIGEgbmVnYXRpdmUgZWZmZWN0IG9uIGhvdXNlIHByaWNlcywgd2hpbGUgZXN0aW1hdGVzIGFwcGVhcmluZyBvbiB0aGUgcmlnaHQgcmVwcmVzZW50IGhhdmluZyBhbiBpbmNyZWFzaW5nIGVmZmVjdCBvbiBob3VzZSBwcmljZXMuIA0KDQoNCiMjIFBvbGsgQ291bnR5IExha2VzDQoNCiFbUG9sayBDb3VudHkgTGFrZXNdKC4uL2ltYWdlcy9wb2xrX2xha2VzLnN2ZykNCg0KUG9sayBDb3VudHkgTGFrZXMgYXJlIHByZXNlbnRlZCBhYm92ZSB2aXN1YWxpemluZyBhdmVyYWdlIE1lcmN1cnksIHBILCBjaGxvcm9waHlsbCwgYW5kIGNhbGNpdW0uIExha2VzIHdpdGggYSBwSCBiZXR3ZWVuIH44IGFuZCA5IGFwcGVhciB0byBoYXZlIGxlc3MgbWVyY3VyeSBhbmQgbW9yZSBjaGxvcm9waHlsbCBwcmVzZW50IHRoYW4gbGFrZXMgd2l0aCBsb3dlciBwSC4gDQoNCmBgYHtyIGVjaG89LCBtZXNzYWdlPUZBTFNFLCB3YXJuaW5nPUZBTFNFfQ0KbGlicmFyeShwbG90bHkpDQpsb2FkKCJQb2xrTGFrZXMucmRhIikNCnANCmBgYA0KDQpUaGUgcG9sayBjb3VudHkgbGFrZXMgdGhhdCBoYWQgd2F0ZXIgcXVhbGl0eSBkYXRhIGF2YWlsYWJsZSBhcmUgcHJlc2VudGVkIGFib3ZlLiBMYWtlIFBhcmtlciwgYmVpbmcgdGhlIGxhcmdlc3QgbGFrZSBoZXJlLCBhcHBlYXJzIHRvIGhhdmUgYSBsb3QgbW9yZSBhbGdhZSBwcmVzZW50IHRoYW4gaW4gdGhlIG90aGVyIHByZXNlbnRlZCBsYWtlcy4gDQoNCiMgQ29uY2x1c2lvbnMNCg0KQm90aCBvZiB0aGUgZGF0YXNldHMgdXNlZCBoZXJlIG1ha2UgZm9yIGdyZWF0IGRlbW9uc3RyYXRpb25zIG9mIHZpc3VhbGl6aW5nIG11bHRpcGxlIGxpbmVhciByZWdyZXNzaW9uLCBpbnRlcmFjdGl2ZSBwbG90cywgYW5kIHNwYXRpYWwgdmlzdWFsaXphdGlvbnMuIE90aGVyIGlkZWFzIHRoYXQgd2VyZSBjb25zaWRlcmVkIGluY2x1ZGUgcGVyZm9ybWluZyBtdWx0aXBsZSBsaW5lYXIgcmVncmVzc2lvbiBvbiB0aGUgbGFrZXMgZGF0YXNldCB0byBmaXQgYSBtb2RlbCBwcmVkaWN0aW5nIGNobG9yb3BoeWxsLCBidXQgdGhpcyB3YXMgbGVmdCBvdXQgaW4gZmF2b3Igb2YgcGVyZm9ybWluZyByZWdyZXNzaW9uIG9uIHRoZSBob3VzaW5nIGRhdGFzZXQuIEZ1dHVyZSBpdGVyYXRpb25zIG9mIHRoaXMgd29yayBzaG91bGQgY29uc2lkZXIgcmV2aXNpdGluZyB0aGUgam9pbiBwZXJmb3JtZWQgb24gdGhlIGxha2UgZGF0YXNldHMsIGFzIGl0IGlzIGxpa2VseSB0aGF0IGFuIGVycm9yIG9jY3VycmVkIHJlc3VsdGluZyBpbiB0aGUgcG9zc2liaWxpdHkgb2YgYXR0cmlidXRlcyBiZWluZyBtYXRjaGVkIHRvIGEgbGFrZSB0aGF0IHNoYXJlZCB0aGUgc2FtZSBuYW1lLCBidXQgbWF5IG5vdCBhY3R1YWxseSBiZSBmcm9tIHRoYXQgY291bnR5LiBJZiBhbm90aGVyIGRhdGFzZXQgd2FzIGZvdW5kIHRoYXQgaW5jbHVkZWQgd2F0ZXIgcXVhbGl0eSBtZXRyaWNzIGFuZCB0aGUgYXNzb2NpYXRlZCBjb3VudHkgd2hlcmUgdGhlIGxha2UgaXMgZnJvbSwgYSBtb3JlIGFjY3VyYXRlIGFuYWx5c2lzIGNvdWxkIGJlIHBlcmZvcm1lZC4gDQoNCg0KDQo=